{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<sub>&copy; 2021-present Neuralmagic, Inc. // [Neural Magic Legal](https://neuralmagic.com/legal)</sub> \n",
    "\n",
    "# Sparse-Quantized Transfer Learning in PyTorch using SparseML\n",
    "\n",
    "This notebook provides a step-by-step walkthrough for creating a performant sparse-quantized model\n",
    "by transfer learning the pruned structure from an already sparse-quantized model.\n",
    "\n",
    "Sparse-quantized models combine [pruning](https://neuralmagic.com/blog/pruning-overview/) and [quantization](https://arxiv.org/abs/1609.07061) to reduce both the number of parameters and the precision of the remaining parameters to significantly increase the performance of neural networks. Using these optimizations, your model will obtain significantly better (around 7x vs. unoptimized) performance at inference time using the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse).\n",
    "\n",
    "Sparse-quantized transfer learning takes two steps. [SparseML](https://github.com/neuralmagic/sparseml) recipes make it easy to perform these optimizations:\n",
    "- First, fine-tune a pre-trained sparse model for the transfer dataset while maintaining the pre-trained sparsity structure.\n",
    "- Second, perform [quantization-aware training (QAT)](https://pytorch.org/blog/introduction-to-quantization-on-pytorch/#quantization-aware-training) to quantize the now sparse model while still holding the same sparsity structure.\n",
    "\n",
    "In this notebook, you will:\n",
    "- Set up the model and dataset\n",
    "- Define a generic PyTorch training flow\n",
    "- Integrate the PyTorch flow with SparseML for transfer learning\n",
    "- Perform sparse transfer learning and quantization-aware training using the PyTorch and SparseML flow\n",
    "- Export to [ONNX](https://onnx.ai/) and convert the model from a QAT\n",
    "- [Optional] Compare DeepSparse Engine benchmarks of the final sparse-quantized model to an unoptimized model\n",
    "\n",
    "Reading through this notebook will be reasonably quick to gain an intuition for how to plug SparseML into your PyTorch training flow for transfer learning and generically. Rough time estimates for fully pruning the default model are given. Note that training with the PyTorch CPU implementation will be much slower than a GPU:\n",
    "- 30 minutes on a GPU\n",
    "- 90 minutes on a laptop CPU"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1 - Requirements\n",
    "\n",
    "To run this notebook, you will need the following packages already installed:\n",
    "* SparseML, SparseZoo\n",
    "* PyTorch (>= 1.7.0) and torchvision\n",
    "\n",
    "You can install any package that is not already present via `pip`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sparseml\n",
    "import sparsezoo\n",
    "import torch\n",
    "import torchvision\n",
    "\n",
    "assert torch.__version__ >= \"1.7\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2 - Setting Up the Model and Dataset\n",
    "\n",
    "By default, you will transfer learn from a sparse-quantized [ResNet-50](https://arxiv.org/abs/1512.03385) model trained on the [ImageNet dataset](http://www.image-net.org/) to the much smaller [Imagenette dataset](https://github.com/fastai/imagenette). The transfer learning weights are downloaded from the [SparseZoo](https://github.com/neuralmagic/sparsezoo) model repository.   The Imagenette dataset is downloaded from its repository via a helper class from SparseML.\n",
    "\n",
    "When loading weights for transfer learning classification models, it is standard to override the final classifier layer to fit the output shape for the new dataset.  In the example below, this is done by specifying `ignore_error_tensors` as the weights that will be initialzed for the new model.  In other flows this could be accomplished by setting `model.classifier.fc = torch.nn.Linear(...)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sparseml.pytorch.models import ModelRegistry\n",
    "from sparseml.pytorch.datasets import ImagenetteDataset, ImagenetteSize\n",
    "from sparsezoo import Model\n",
    "\n",
    "#######################################################\n",
    "# Define your model below\n",
    "#######################################################\n",
    "print(\"loading model...\")\n",
    "# SparseZoo stub to pre-trained sparse-quantized ResNet-50 for imagenet dataset\n",
    "zoo_stub_path = (\n",
    "    \"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned_quant-moderate\"\n",
    "    \"?recipe=transfer_learn\"\n",
    ")\n",
    "model = ModelRegistry.create(\n",
    "    key=\"resnet50\",\n",
    "    pretrained_path=zoo_stub_path,\n",
    "    pretrained_dataset=\"imagenette\",\n",
    "    num_classes=10,\n",
    "    ignore_error_tensors=[\"classifier.fc.weight\", \"classifier.fc.bias\"],\n",
    ")\n",
    "input_shape = ModelRegistry.input_shape(\"resnet50\")\n",
    "input_size = input_shape[-1]\n",
    "print(model)\n",
    "#######################################################\n",
    "# Define your train and validation datasets below\n",
    "#######################################################\n",
    "\n",
    "print(\"\\nloading train dataset...\")\n",
    "train_dataset = ImagenetteDataset(\n",
    "    train=True, dataset_size=ImagenetteSize.s320, image_size=input_size\n",
    ")\n",
    "print(train_dataset)\n",
    "\n",
    "print(\"\\nloading val dataset...\")\n",
    "val_dataset = ImagenetteDataset(\n",
    "    train=False, dataset_size=ImagenetteSize.s320, image_size=input_size\n",
    ")\n",
    "print(val_dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3 - Creating a PyTorch Training Loop\n",
    "SparseML can plug directly into your existing PyTorch training flow by overriding the Optimizer object. To demonstrate this, in the cell below, we define a simple PyTorch training loop adapted from [here](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html).  To prune and quantize your existing models using SparseML, you can use your own training flow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tqdm.auto import tqdm\n",
    "import math\n",
    "import torch\n",
    "\n",
    "\n",
    "def run_model_one_epoch(model, data_loader, criterion, device, train=False, optimizer=None):\n",
    "    if train:\n",
    "        model.train()\n",
    "    else:\n",
    "        model.eval()\n",
    "\n",
    "    running_loss = 0.0\n",
    "    total_correct = 0\n",
    "    total_predictions = 0\n",
    "\n",
    "    for step, (inputs, labels) in tqdm(enumerate(data_loader), total=len(data_loader)):\n",
    "        inputs = inputs.to(device)\n",
    "        labels = labels.to(device)\n",
    "\n",
    "        if train:\n",
    "            optimizer.zero_grad()\n",
    "\n",
    "        outputs, _ = model(inputs)  # model returns logits and softmax as a tuple\n",
    "        loss = criterion(outputs, labels)\n",
    "\n",
    "        if train:\n",
    "            loss.backward()\n",
    "            optimizer.step()\n",
    "\n",
    "        running_loss += loss.item()\n",
    "\n",
    "        predictions = outputs.argmax(dim=1)\n",
    "        total_correct += torch.sum(predictions == labels).item()\n",
    "        total_predictions += inputs.size(0)\n",
    "\n",
    "    loss = running_loss / (step + 1.0)\n",
    "    accuracy = total_correct / total_predictions\n",
    "    return loss, accuracy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4 - Building PyTorch Training Objects\n",
    "In this step, you will select hyperparameters, a device to train your model with, set up DataLoader objects, a loss function, and optimizer.  All of these variables and objects can be replaced to fit your training flow."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.data import DataLoader\n",
    "from torch.nn import CrossEntropyLoss\n",
    "from torch.optim import Adam\n",
    "\n",
    "# hyperparameters\n",
    "BATCH_SIZE = 32\n",
    "\n",
    "# setup device\n",
    "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
    "model.to(device)\n",
    "print(f\"Using device: {device}\")\n",
    "\n",
    "# setup data loaders\n",
    "train_loader = DataLoader(\n",
    "    train_dataset, BATCH_SIZE, shuffle=True, pin_memory=True, num_workers=8\n",
    ")\n",
    "val_loader = DataLoader(\n",
    "    val_dataset, BATCH_SIZE, shuffle=False, pin_memory=True, num_workers=8\n",
    ")\n",
    "\n",
    "# setup loss function and optimizer, LR will be overriden by sparseml\n",
    "criterion = CrossEntropyLoss()\n",
    "optimizer = Adam(model.parameters(), lr=8e-3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 5 - Running Sparse-Quantized Transfer Learning with a SparseML Recipe\n",
    "\n",
    "To run sparse-quantized transfer learning with SparseML, you will download a transfer learning recipe from SparseZoo and use it to create a `ScheduledModifierManager` object.  This manager will be used to wrap the optimizer object to maintain the pre-optimized model's sparsity structure while learning weights for the new dataset as well as performing quantization-aware training (QAT).\n",
    "\n",
    "You can create SparseML recipes to perform various model pruning schedules, QAT, sparse transfer learning, and more.  If you are using a different model than the default, you will have to modify the recipe  file to match the new target's parameters.\n",
    "\n",
    "Using the wrapped optimizer object, you will call the training function to prune your model. Finalize the model after training by making a call to manager's `finalize(...)` method.\n",
    "\n",
    "If the kernel shuts down during training, this may be an out of memory error; to resolve this, try lowering the `batch_size` in the cell above."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Downloading a Recipe from SparseZoo\n",
    "The [SparseZoo](https://github.com/neuralmagic/sparsezoo) API provides preconfigured recipes for its optimized model.  In the cell below, you will download a recipe for pruning ResNet-50 on the Imagenette dataset and record its saved path."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sparsezoo import Model\n",
    "\n",
    "zoo_model = Model(zoo_stub_path)\n",
    "recipe_path = zoo_model.recipes.default.path\n",
    "print(f\"Recipe downloaded to: {recipe_path}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sparseml.pytorch.optim import ScheduledModifierManager\n",
    "\n",
    "# create ScheduledModifierManager and Optimizer wrapper\n",
    "manager = ScheduledModifierManager.from_yaml(recipe_path)\n",
    "optimizer = manager.modify(model, optimizer, steps_per_epoch=len(train_loader))\n",
    "\n",
    "\n",
    "# Run model pruning\n",
    "epoch = manager.min_epochs\n",
    "for epoch in range(manager.max_epochs):\n",
    "    # run training loop\n",
    "    epoch_name = f\"{epoch + 1}/{manager.max_epochs}\"\n",
    "    print(f\"Running Training Epoch {epoch_name}\")\n",
    "    train_loss, train_acc = run_model_one_epoch(\n",
    "        model, train_loader, criterion, device, train=True, optimizer=optimizer\n",
    "    )\n",
    "    print(\n",
    "        f\"Training Epoch: {epoch_name}\\nTraining Loss: {train_loss}\\nTop 1 Acc: {train_acc}\\n\"\n",
    "    )\n",
    "\n",
    "    # run validation loop\n",
    "    print(f\"Running Validation Epoch {epoch_name}\")\n",
    "    val_loss, val_acc = run_model_one_epoch(model, val_loader, criterion, device)\n",
    "    print(\n",
    "        f\"Validation Epoch: {epoch_name}\\nVal Loss: {val_loss}\\nTop 1 Acc: {val_acc}\\n\"\n",
    "    )\n",
    "\n",
    "manager.finalize(model)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 6 - Viewing Model Sparsity\n",
    "To see the effects of sparse-quantized transfer learning, in this step, you will print out the sparsities of each Conv and FC layer in your model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sparseml.pytorch.utils import get_prunable_layers, tensor_sparsity\n",
    "\n",
    "# print sparsities of each layer\n",
    "for (name, layer) in get_prunable_layers(model):\n",
    "    print(f\"{name}.weight: {tensor_sparsity(layer.weight).item():.4f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 7 - Exporting to ONNX\n",
    "\n",
    "Now that the sparse-quantized transfer learning is complete, it should be prepped for inference.  A common next step for inference is exporting the model to ONNX.  This is also the format used by the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse) to achieve the sparse-quantized speedups.\n",
    "\n",
    "For PyTorch, exporting to ONNX is natively supported. In the cell block below, a convenience class, ModuleExporter(), is used to handle exporting.\n",
    "\n",
    "Additionally, PyTorch, exports a graph setup for quantization-aware training (QAT) to ONNX. To run a fully quantized graph, you will need to convert these QAT operations to fully quantized INT8 operations.  SparseML provides the `quantize_torch_qat_export` helper function to perform this conversion.\n",
    "\n",
    "Once the model is saved as an ONNX ﬁle, it is ready to be used for inference with the DeepSparse Engine.\n",
    "\n",
    "Normally, exporting a QAT model from PyTorch to ONNX will create a graph with \"fake quantized\" operations that represent the QAT graph.  By setting `convert_qat=True` in our exporter, a function will automatically be called to convert this exported model to a fully quantized graph that will contain desired quantized structure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from sparseml.pytorch.utils import ModuleExporter\n",
    "\n",
    "save_dir = \"pytorch_sparse_quantized_transfer_learning\"\n",
    "quant_onnx_graph_name = \"resnet50_imagenette_pruned_quant.onnx\"\n",
    "quantized_onnx_path = os.path.join(save_dir, quant_onnx_graph_name)\n",
    "\n",
    "exporter = ModuleExporter(model, output_dir=save_dir)\n",
    "exporter.export_pytorch(name=\"resnet50_imagenette_pruned_qat.pth\")\n",
    "exporter.export_onnx(\n",
    "    torch.randn(1, 3, 224, 224), name=quant_onnx_graph_name, convert_qat=True\n",
    ")\n",
    "\n",
    "print(f\"Sparse-Quantized ONNX model saved to {quantized_onnx_path}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## [Optional] Step 8 - Benchmarking\n",
    "\n",
    "Finally, to see the total effect of these optimizations, you will benchmark an unoptimized, dense ResNet-50 model from SparseZoo against your sparse-quantized model using the `deepsparse` API.\n",
    "\n",
    "To run this step `deepsparse` must be installed in your python environment. You can install it with `pip install deepsparse`.\n",
    "\n",
    "Note, in order to view speedup from quantization, your CPU must run VNNI instructions.  The benchmarking cell below contains a check for VNNI instructions and will log a warning if they are not detected.  You can learn more about DeepSparse hardware compatibility [here](https://docs.neuralmagic.com/deepsparse/hardware.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy\n",
    "from deepsparse import benchmark_model\n",
    "from deepsparse.cpu import cpu_architecture\n",
    "\n",
    "\n",
    "# check VNNI\n",
    "if cpu_architecture()[\"vnni\"]:\n",
    "    print(\"VNNI extensions detected, model will run with quantized speedups\\n\")\n",
    "else:\n",
    "    print(\n",
    "        \"WARNING: No VNNI extensions detected. Your model will not run with \"\n",
    "        \"quantized speedups which will affect benchmarking\\n\"\n",
    "    )\n",
    "\n",
    "\n",
    "BATCH_SIZE = 64\n",
    "NUM_CORES = None  # maximum number of cores available\n",
    "NUM_ITERATIONS = 100\n",
    "NUM_WARMUP_ITERATIONS = 20\n",
    "\n",
    "\n",
    "def benchmark_imagenette_model(model_name, model_path):\n",
    "    print(\n",
    "        f\"Benchmarking {model_name} for {NUM_ITERATIONS} iterations at batch \"\n",
    "        f\"size {BATCH_SIZE} with {NUM_CORES} CPU cores\"\n",
    "    )\n",
    "    sample_input = [\n",
    "        numpy.ascontiguousarray(\n",
    "            numpy.random.randn(BATCH_SIZE, 3, 224, 224).astype(numpy.float32)\n",
    "        )\n",
    "    ]\n",
    "\n",
    "    results = benchmark_model(\n",
    "        model=model_path,\n",
    "        inp=sample_input,\n",
    "        batch_size=BATCH_SIZE,\n",
    "        num_cores=NUM_CORES,\n",
    "        num_iterations=NUM_ITERATIONS,\n",
    "        num_warmup_iterations=NUM_WARMUP_ITERATIONS,\n",
    "        show_progress=True,\n",
    "    )\n",
    "    print(f\"results:\\n{results}\")\n",
    "    return results\n",
    "\n",
    "\n",
    "# base ResNet-50 Imagenette model downloaded from SparseZoo\n",
    "base_results = benchmark_imagenette_model(\n",
    "    \"ResNet-50 Imagenette Base\",\n",
    "    \"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenette/base-none\"\n",
    ")\n",
    "\n",
    "optimized_results = benchmark_imagenette_model(\n",
    "    \"ResNet-50 Imagenette pruned-quantized\", quantized_onnx_path\n",
    ")\n",
    "\n",
    "speed_up = base_results.ms_per_batch / optimized_results.ms_per_batch\n",
    "print(f\"Speed-up from sparse-quantized transfer learning: {speed_up}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next Steps\n",
    "\n",
    "Congratulations, you have created a sparse-quantized model and exported it to ONNX for inference!  Next steps you can pursue include:\n",
    "* Transfer learning, pruning, or quantizing different models using SparseML\n",
    "* Trying different pruning and optimization recipes\n",
    "* Benchmarking other models on the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
